Goto

Collaborating Authors

 ilya sutskever



Large Memory Layers with Product Keys

Guillaume Lample, Alexandre Sablayrolles, Marc'Aurelio Ranzato, Ludovic Denoyer, Herve Jegou

Neural Information Processing Systems

This paper introduces a structured memory which can be easily integrated into a neural network. The memory is very large by design and significantly increases the capacity of the architecture, by up to a billion parameters with a negligible computational overhead.


Sequencer: Deep LSTMfor Image Classification

Neural Information Processing Systems

The modernize result, our Second, the connects Ontheother77], theoutput BiLSTM. Weadopt AdamWoptimizer [wingthepreviousstudy [weadopt ratebatchsizesfor Sequencer2D-S, Sequencer2D-M, are 2048, 1536, and 1024, respectively.





IncorporatingBERTinto ParallelSequenceDecodingwithAdapters

Neural Information Processing Systems

While largescale pre-trained language models such asBERT[5]haveachieved greatsuccess onvariousnatural language understanding tasks,howtoefficiently and effectively incorporate them into sequence-to-sequence models and the corresponding text generation tasks remains a non-trivial problem.